Automated Taxonomy Generation for Summarizing Multi-Type Relational Datasets
نویسندگان
چکیده
Taxonomy construction provides an efficient navigating and browsing mechanism to people by organizing large amounts of information into a small number of hierarchical clusters. Compared with manually editing taxonomies, Automated Taxonomy Generation has numerous advantages and has therefore been applied to categorize document collections. However, the utility of this technique to organize and represent relational datasets has not been investigated, because of its unaffordable computational complexity. In this paper we propose a new ATG method based on the relational clustering framework DIVA. By incorporating the idea of Representative Objects, the computational complexity can be greatly reduced. Moreover, we analyze the divergence of the data attributes and label the taxonomic nodes accordingly. The quality of the derived taxonomy is quantitatively evaluated by a synthesized criterion that considers both the intra-node homogeneity and inter-node heterogeneity. Theoretical analysis and experimental results prove that our approach is comparably effective and more efficient than other ATG algorithms.
منابع مشابه
Labeling Nodes of Automatically Generated Taxonomy for Multi-type Relational Datasets
Automatic Taxonomy Generation organizes a large dataset into a hierarchical structure so as to facilitate people’s navigation and browsing actions. To better summarize the content of each node as well as to reflect the distinctiveness between sibling ones, meaningful labels need to be assigned to all the nodes within a derived taxonomy. Current research only focuses on labeling taxonomies that ...
متن کاملMulti-type Relational Clustering Approaches: Current State-of-the-Art and New Directions
The proliferation of multi-type relational datasets in a number of important real-world applications and the limitations resulting from the transformation of such datasets to fit propositional data mining approaches have led to the emergence of the discipline of multi-type relational data mining. Clustering is an important unsupervised learning task aimed at discovering structure inherent in da...
متن کاملExploiting Domain Knowledge by Automated Taxonomy Generation in Recommender Systems
The effectiveness of incorporating domain knowledge into recommender systems to address their sparseness problem and improve their prediction accuracy has been discussed in many research works. However, this technique is usually restrained in practice because of its high computational expense. Although cluster analysis can alleviate the computational complexity of the recommendation procedure, ...
متن کاملA Taxonomy of Meta-learning Techniques and Proposed Framework for Automated Landmarker Generation and Selection
Many different perspectives have been adopted regarding the form of learning labelled as metalearning, with little to no consensus as to a proper definition. As such, a general definition and taxonomy of meta-learning techniques defined, segmenting meta-learning into two categories: mono-problem and multi-problem. A further taxonomy of multi-problem metalearning methods is then described, empha...
متن کاملTransforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008